现代视觉惯性导航系统(VINS)面临着实际部署中的一个关键挑战:他们需要在高度动态的环境中可靠且强大地运行。当前最佳解决方案仅根据对象类别的语义将动态对象过滤为异常值。这样的方法不缩放,因为它需要语义分类器来包含所有可能移动的对象类;这很难定义,更不用说部署。另一方面,许多现实世界的环境以墙壁和地面等平面形式表现出强大的结构规律,这也是至关重要的。我们呈现RP-VIO,一种单眼视觉惯性内径系统,可以利用这些平面的简单几何形状,以改善充满活力环境的鲁棒性和准确性。由于现有数据集具有有限数量的动态元素,因此我们还提供了一种高动态的光致态度合成数据集,用于更有效地对现代VINS系统的功能的评估。我们评估我们在该数据集中的方法,以及来自标准数据集的三个不同序列,包括两个真实的动态序列,并在最先进的单眼视觉惯性内径系统上显示出鲁棒性和准确性的显着提高。我们还显示在模拟中,通过简单的动态特征掩蔽方法改进。我们的代码和数据集是公开可用的。
translated by 谷歌翻译
在过去的十年中,神经网络在各种各样的反问题中取得了显着的成功,从医学成像到地震分析等学科中的采用促进了他们的收养。但是,这种反问题的高维度同时使当前理论预测,网络应在问题的维度上成倍扩展,无法解释为什么在这些设置中使用的看似很小的网络在实践中也可以正常工作。为了减少理论和实践之间的差距,在本文中提供了一种在具有低复杂性结构的高维置的神经网络近似Lipschitz函数所需的复杂性的一般方法。该方法基于这样的观察,即在\ mathbb {r}^in \ mathbb {r}^{d \ times d} $ in \ mathbb {a} \ in \ mathbb {a} \ in \ mathcal集合$ \ mathcal {S } \ subset \ mathbb {r}^d $中的低维立方体$ [ - m,m]^d $意味着对于任何Lipschitz函数$ f:\ mathcal {s} \ to \ mathbb {r}^p $ ,存在lipschitz函数$ g:[-m,m]^d \ to \ mathbb {r}^p $,使得$ g(\ mathbf {a} \ mathbf {x})= f(\ mathbf {x })$用于所有$ \ mathbf {x} \ in \ mathcal {s} $。因此,如果一个人具有一个近似$ g的神经网络:[-m,m]^d \ to \ mathbb {r}^p $,则可以添加一个图层,以实现JL嵌入$ \ mathbf {A a} $要获得一个近似于$ f的神经网络:\ mathcal {s} \ to \ mathbb {r}^p $。通过将JL嵌入结果与神经网络近似Lipschitz函数的近似结果配对,然后获得了一个结果,这些结果绑定了神经网络所需的复杂性,以近似Lipschitz在高尺寸集合上的功能。最终结果是一个一般的理论框架,然后可以用它来更好地解释比当前理论所允许的更广泛的逆问题中较小的网络的经验成功。
translated by 谷歌翻译
机器可以学习机器学习吗?我们建议使用我们用来回答类似问题的相同标准回答这个问题:人类学习机器学习吗?我们在人类级别的机器学习介绍中自动回答麻省理工学院的期末考试。该课程是一个大型的本科课程,每个学期约有五百名学生。最近,计划合成和几乎没有学习的学习解决了大学级问题,在人类层面设定了数学和STEM课程的问题。在这项工作中,我们从期末考试中解决了与问题集不同的问题:问题更长,有多个部分,更复杂,并且跨越了更广泛的主题。我们在2017年秋季至2022年春季之间的八项麻省理工学院介绍最终考试中提供了一个新的数据集和基准,并提供了自动回答这些问题并产生新问题的代码。我们进行消融研究,比较零拍的学习与几乎没有的学习,经过思考链的提示,GPT-3在文本上进行了预训练,并且在一系列机器学习主题上进行了代码进行了微调,并发现了很少的照片学习方法表现最好。我们将数据和代码公开用于机器学习社区。
translated by 谷歌翻译
过度分化的神经网络倾向于完全符合嘈杂的训练数据,但在测试数据上概括。灵感来自这一实证观察,最近的工作试图了解在更简单的线性模型中的良性过度或无害插值的这种现象。以前的理论工作批判性地假设数据特征是统计独立的,或者输入数据是高维的;这会阻止具有结构化特征映射的一般非参数设置。在本文中,我们为再生内核希尔伯特空间中的上限回归和分类风险提供了一般和灵活的框架。关键贡献是我们的框架在数据革处矩阵上描述了精确的充分条件,在这种情况下发生无害的插值。我们的结果恢复了现有的独立功能结果(具有更简单的分析),但它们还表明,在更常规的环境中可能发生无害的插值,例如有界正常系统的功能。此外,我们的结果表明,以先前仅针对高斯特征的方式显示分类和回归性能之间的渐近分离。
translated by 谷歌翻译
In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pre-training has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embeddings (AWE) for variable-length segments. However, self-supervised models alone cannot learn perfect separation of the linguistic content as they are trained to optimize indirect objectives. In this work, we experiment with different pre-trained self-supervised features as input to AWE models and show that they work best within a supervised framework. Models trained on English can be transferred to other languages with no adaptation and outperform self-supervised models trained solely on the target languages.
translated by 谷歌翻译
Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.
translated by 谷歌翻译
Automated cellular instance segmentation is a process utilized for accelerating biological research for the past two decades, and recent advancements have produced higher quality results with less effort from the biologist. Most current endeavors focus on completely cutting the researcher out of the picture by generating highly generalized models. However, these models invariably fail when faced with novel data, distributed differently than the ones used for training. Rather than approaching the problem with methods that presume the availability of large amounts of target data and computing power for retraining, in this work we address the even greater challenge of designing an approach that requires minimal amounts of new annotated data as well as training time. We do so by designing specialized contrastive losses that leverage the few annotated samples very efficiently. A large set of results show that 3 to 5 annotations lead to models with accuracy that: 1) significantly mitigate the covariate shift effects; 2) matches or surpasses other adaptation methods; 3) even approaches methods that have been fully retrained on the target distribution. The adaptation training is only a few minutes, paving a path towards a balance between model performance, computing requirements and expert-level annotation needs.
translated by 谷歌翻译
For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only leverage training examples that fit into the context window. Existing efforts to address the context window limitation involve training specialized architectures, which tend to be smaller than the sizes in which in-context learning manifests due to the memory footprint of processing long texts. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows'') that fit within the architecture, restrict the attention mechanism to apply only within each window, and re-use the positional embeddings among the windows. We test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. Our results motivate further investigation of Parallel Context Windows as a method for applying off-the-shelf LLMs in other settings that require long text sequences.
translated by 谷歌翻译
Dual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting distributions over vocabulary tokens are intuitive and contain rich semantic information. We find that this view can explain some of the failure cases of dense retrievers. For example, the inability of models to handle tail entities can be explained via a tendency of the token distributions to forget some of the tokens of those entities. We leverage this insight and propose a simple way to enrich query and passage representations with lexical information at inference time, and show that this significantly improves performance compared to the original model in out-of-domain settings.
translated by 谷歌翻译
Gait recognition, which identifies individuals based on their walking patterns, is an important biometric technique since it can be observed from a distance and does not require the subject's cooperation. Recognizing a person's gait is difficult because of the appearance variants in human silhouette sequences produced by varying viewing angles, carrying objects, and clothing. Recent research has produced a number of ways for coping with these variants. In this paper, we present the usage of inferring 3-D body shapes distilled from limited images, which are, in principle, invariant to the specified variants. Inference of 3-D shape is a difficult task, especially when only silhouettes are provided in a dataset. We provide a method for learning 3-D body inference from silhouettes by transferring knowledge from 3-D shape prior from RGB photos. We use our method on multiple existing state-of-the-art gait baselines and obtain consistent improvements for gait identification on two public datasets, CASIA-B and OUMVLP, on several variants and settings, including a new setting of novel views not seen during training.
translated by 谷歌翻译